Method and apparatus for recognizing speaker by using a resonator

ABSTRACT

Provided are a method and device for recognizing a speaker by using a resonator. The method of recognizing the speaker includes receiving a plurality of electrical signals corresponding to a speech of the speaker from a plurality of resonators having different resonance bands; obtaining a difference of magnitudes of the plurality of electrical signals; and recognizing the speaker based on the difference of magnitudes of the plurality of electrical signals.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.16/474,465, filed Jun. 27, 2019, which is a U.S. national stage entryunder 35 U.S.C. 371(c) of International Application No.PCT/KR2017/015020, filed on Dec. 19, 2017, in the Korean Patent Office,which claims priority from Korean Patent Application No.10-2016-0182792, filed on Dec. 29, 2016, in the Korean IntellectualProperty Office, the disclosures of which are herein incorporated byreference in their entireties.

BACKGROUND 1. Field

Methods and apparatuses consistent with example embodiments relate to amethod and an apparatus for recognizing a speaker by using a resonator.

2. Description of the Related Art

Spectrum analyzers that analyze the spectrum of sound or vibration maybe used in a variety of devices. For example, spectrum analyzers may beemployed in a computer, an automobile, a mobile phone, or a homeappliance for speech recognition, speaker recognition, and situationrecognition related to sound or vibration. Also, spectrum analyzers maybe mounted on buildings, various home appliances, etc., for analyzingvibration information.

Spectrum analyzers may include sensors such as a mechanical resonatorand an electrical analog or digital filter may be used to filter asignal of a frequency band of a specific region. A Fourier transform orthe like may be performed by using signals obtained from the sensor toanalyze the spectrum.

SUMMARY

One or more example embodiments provide a method of recognizing aspeaker by using a resonator with improved accuracy and efficiency.

One or more example embodiments provide an apparatus for recognizing aspeaker including a resonator with improved accuracy and efficiency.

According to an aspect of an example embodiment, provided is a method ofrecognizing a speaker, the method including: receiving a plurality ofelectrical signals corresponding to a speech of the speaker from aplurality of resonators having different resonance bands; obtaining adifference of magnitudes of the plurality of electrical signals; andrecognizing the speaker based on the difference of magnitudes of theplurality of electrical signals.

The difference of magnitudes of the plurality of electrical signals mayinclude a difference of magnitudes of electrical signals output from tworesonators having neighboring resonance frequencies.

The recognizing the speaker may include generating a bitmap of a bandgradient by encoding the difference of magnitudes of the plurality ofelectrical signals; and recognizing the speaker based on the bitmap ofthe band gradient.

The encoding may include converting the difference of magnitudes of theplurality of electrical signals into any one of three or more odd-numberof values, by using one or more threshold values.

The three or more odd-number of values may include values that have asame absolute value and opposite signs.

The three or more odd-number of values may include −a, 0, and a (where ais a constant).

The recognizing the speaker may include registering the speaker, theregistering including: generating a speaker model based on the bitmap ofthe band gradient; and registering the speaker model as anauthentication template.

The recognizing the speaker may include determining whether the speakeris a registered speaker by performing: generating a characteristic valuebased on the bitmap of the band gradient; and determining whether thespeaker is the registered speaker based on comparison between thecharacteristic value and the authentication template.

The recognizing the speaker may include determining a vowel included inthe speech of the speaker based on the difference of magnitudes of theplurality of electrical signals.

The determining the vowel may include estimating relative positions offormants based on the difference of magnitudes of the plurality ofelectrical signals; and determining the vowel based on the relativepositions of the formants.

A number of the formants may be three.

The difference of magnitudes of the plurality of electrical signals maybe determined based on magnitudes of electrical signals received fromfour resonators of a resonator sensor.

The difference of magnitudes of the plurality of electrical signals mayinclude a plurality of differences of magnitudes of the plurality ofelectrical signals, and the recognizing the speaker may include:assigning a weight to a model corresponding to the determined vowel inan authentication template; generating a bitmap of a band gradient basedon a first difference of magnitudes of the plurality of electricalsignals, which is different from a second difference of magnitudes ofthe plurality of electrical signals used to determine the vowel;generating a characteristic value based on the bitmap of the bandgradient; and recognizing whether the speaker is a registered speakerbased on comparison between the characteristic value and theauthentication template, in which the model corresponding to thedetermined vowel is assigned the weight.

The assigning the weight may include: assigning the weight to the modelcorresponding to the determined vowel that is higher than a weightassigned to a model corresponding to another vowel.

The weight assigned to the model corresponding to the determined vowelmay be 1 and the weight assigned to the model corresponding to theanother vowel may be 0.

A number of the first difference of magnitudes of the plurality ofelectrical signals used to generate the bitmap of the band gradient maybe greater than a number of the second difference of magnitudes of theplurality of electrical signals used to determine the vowel.

According to another aspect of an example embodiment, there is providedan apparatus for recognizing a speaker, the apparatus including: aresonator sensor including a plurality of resonators having differentresonance bands, the plurality of resonators configured to output aplurality of electrical signals corresponding to a speech of thespeaker; and a processor configured to obtain a difference of magnitudesof the plurality of electrical signals and recognize the speaker basedon the difference of magnitudes of the plurality of electrical signals.

The difference of magnitudes of the plurality of electrical signals mayinclude a difference of magnitudes of electrical signals output from tworesonators having neighboring resonance frequencies.

The processor may be further configured to generate a bitmap of a bandgradient by encoding the difference of magnitudes of the plurality ofelectrical signals and recognize the speaker based on the bitmap of theband gradient.

The processor may be further configured to encode the difference ofmagnitudes of the plurality of electrical signals by converting thedifference of magnitudes of the plurality of electrical signals into anyone of three or more odd-number of values, by using one or morethreshold values.

The processor may be further configured to determine whether the speakeris a registered speaker based on comparison between a characteristicvalue determined based on the bitmap of the band gradient and aregistered authentication template.

The processor may be further configured to determine a vowel included inthe speech of the speaker based on the difference of magnitudes of theplurality of electrical signals.

The processor may be further configured to estimate relative positionsof formants based on the difference of magnitudes of the plurality ofelectrical signals and determine the vowel based on the relativepositions of the formants.

The difference of magnitudes of the plurality of electrical signals maybe determined based on magnitudes of electrical signals received fromfour resonators of the resonator sensor.

The difference of magnitudes of the plurality of electrical signals mayinclude a plurality of differences of magnitudes of the plurality ofelectrical signals, and the processor may be further configured to:assign a weight to a model corresponding to the determined vowel in anauthentication template, generate a characteristic value based on afirst difference of magnitudes of the plurality of electrical signals,which is different from a second difference of magnitudes of theplurality of electrical signals used to determine the vowel, andrecognize the speaker based on comparison between the characteristicvalue and the authentication template, in which the model correspondingto the determined vowel is assigned.

The processor may be further configured to assign the weight to themodel corresponding to the determined vowel that is higher than a weightassigned to a model corresponding to another vowel.

According to still another aspect of an example embodiment, provided isa method of recognizing a speaker, the method including: receivingsignals of a frequency band corresponding to a speech of the speaker;obtaining a difference of magnitudes of the signals; determining a vowelincluded in the speech of the speaker based on the difference ofmagnitudes of the signals; and determining whether the speaker is aregistered speaker based on the determined vowel.

The determining the vowel may include: estimating relative positions offormants based on the difference of magnitudes of the signals; anddetermining the vowel based on the relative positions of the formants.

The receiving may include receiving the signals of the frequency bandfrom a plurality of resonators having different resonance bands.

The determining whether the speaker is the registered speaker mayinclude: assigning a weight to a model corresponding to the determinedvowel in an authentication template; generating a characteristic valueof the speaker corresponding to the speech of the speaker; determiningwhether the speaker is the registered speaker based on comparisonbetween the characteristic value and the authentication template, inwhich the model corresponding to the determined vowel is assigned theweight.

The assigning the weight may include: assigning the weight to the modelcorresponding to the determined vowel to be higher than a weightassigned to a model corresponding to another vowel.

The weight assigned to the model corresponding to the determined vowelmay be 1 and the weight assigned to the model corresponding to theanother vowel may be 0.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and/or other aspects of the disclosure will become apparentand more readily appreciated from the following description of exampleembodiments, taken in conjunction with the accompanying drawings, inwhich:

FIG. 1 is a plan view showing a schematic structure of a resonatorsensor including a plurality of resonators according to an exampleembodiment;

FIG. 2 is a cross-sectional view of the resonator shown in FIG. 1 ,taken along line L1-L2;

FIG. 3 is a block diagram schematically illustrating a speakerrecognition apparatus including a resonator according to an exampleembodiment;

FIG. 4 is a diagram illustrating a speaker recognition method using aresonator according to an example embodiment;

FIG. 5 is an example of a graph showing a speech having differentresonance bands;

FIG. 6 is a diagram illustrating an example of generating a bitmap of aband gradient using a magnitude difference of electrical signalscorresponding to resonance bands;

FIG. 7 is a graph illustrating an equation for encoding a magnitudedifference of electrical signals corresponding to resonance bandsaccording to an embodiment;

FIG. 8 is a diagram showing a bitmap of a two-dimensional band gradientover time according to an example embodiment;

FIG. 9 is a spectrum showing a resonance band of a vowel [AH]pronunciation;

FIG. 10 is a spectrum showing a resonance band of a vowel [EE]pronunciation;

FIGS. 11 and 12 are graphs illustrating estimating a position of aformant using resonators spaced from each other in connection with voweldetermination according to an example embodiment;

FIG. 13 is a reference diagram showing positions of formants of a vowelaccording to an example embodiment;

FIG. 14 is a flowchart illustrating a method of recognizing a speakerusing a vowel and a bitmap of a band gradient;

FIG. 15 is a reference diagram for explaining a comparison betweenspeaker characteristic values and authentication templates in a shortspeech;

FIGS. 16 and 17 are diagrams showing examples in which centerfrequencies of a plurality of resonators of a resonator sensor are setat an equal interval according to an example embodiment;

FIGS. 18 and 19 are diagrams showing examples in which centerfrequencies of a plurality of resonators of a resonator sensor are setat a constant interval according to an example embodiment;

FIGS. 20 and 21 are diagrams showing examples in which centerfrequencies of a plurality of resonators of a resonator sensor are setat an arbitrary interval according to an example embodiment;

FIG. 22 is a plan view showing a schematic structure of a resonatorsensor including a plurality of resonators according to an exampleembodiment;

FIGS. 23, 24, and 25 are graphs illustrating examples of variouslychanging bandwidths of a plurality of resonators of a resonator sensoraccording to an example embodiment; and

FIG. 26 is a graph showing that a bandwidth of a specific resonatoramong a plurality of resonators of a resonator sensor is set wideaccording to an example embodiment.

DETAILED DESCRIPTION

Hereinafter, example embodiments will be described in detail withreference to the accompanying drawings. In the following drawings, likereference numerals refer to like elements, and the size of each elementin the drawings may be exaggerated for clarity and convenience ofexplanation. The embodiments described below are merely examples, andvarious modifications are possible. Hereinafter, when a constituentelement is disposed “above” or “on” to another constituent element, theconstituent element may be only directly on the other constituentelement or above the other constituent elements in a non-contact manner.Also, the terms “comprises” and/or “comprising” used herein specify thepresence of stated features or components, but do not preclude thepresence or addition of one or more other features or components unlessspecifically stated otherwise.

As used herein, expressions such as “at least one of,” when preceding alist of elements, modify the entire list of elements and do not modifythe individual elements of the list. For example, the expression, “atleast one of a, b, and c,” should be understood as including only a,only b, only c, both a and b, both a and c, both b and c, or all of a,b, and c.

It will be understood that, although the terms first, second, third,etc., may be used herein to describe various elements, these elementsshould not be limited by these terms. These terms are only used todistinguish one element from another element. In addition, the terms,such as ‘part’ or ‘unit’, etc., should be understood as a unit thatperforms at least one function or operation and that may be embodied ashardware, software, or a combination thereof.

FIG. 1 is a plan view showing a schematic structure of a resonatorsensor including a plurality of resonators according to an exampleembodiment.

A resonator sensor 100 of FIG. 1 may be used as a spectrum analyzerconfigured to analyze the spectrum of sound or vibration. The resonatorsensor 100 may include a plurality of resonators having differentresonance bands, for example, a first resonator R1, a second resonatorR2, . . . , an n-th resonator Rn. The number of unit resonators includedin the resonator sensor 100 may be two or more and may be determinedaccording to a user selection. However, embodiments are not limitedthereto. The resonators R1, R2, . . . , Rn may be formed to have alength of about several millimeters (mm) or less and may be manufacturedby using, for example, a micro electro mechanical system (MEMS) process.Each resonator resonates only for a specific frequency band, and aresonance frequency band is referred to as a resonance band.

FIG. 2 is a cross-sectional view of a resonator according to the exampleembodiment shown in FIG. 1 , taken along line L1-L2.

Referring to FIG. 2 , the first resonator R1 may include a fixing unit11 and a supporting unit 14 protruding from the fixing unit 11 in onedirection, for example, a y direction and extending. A sensor unit 12and a mass unit 16 may be formed on the supporting unit 14. The sensorunit 12 may be formed at a first end of the supporting unit 14, forexample, in a region adjacent to the fixing unit 11. The mass unit 16may be formed on an opposite side of the first end of the supportingunit 14, for example, in a region relatively distanced away from thefixing unit 11.

The fixing unit 11 is a region from which the supporting units 14 of theresonators R1, R2 Rn protrude and may be formed of a material typicallyused as a substrate of an electronic device. The supporting unit 14 maybe formed of Si or the like and may have a shape of a beam or a thin andelongated plate in one direction and may be referred to as a cantilever.A first end of the supporting unit 14 may be fixed by the fixing unit 11and a second end may freely vibrate in an upward and downward direction,for example, a z direction as shown in FIG. 2 without being fixed by anyobject. Alternatively, unlike shown in FIG. 2 , the supporting unit 14of the resonator may have a shape in which both sides of the supportingunit 14 are fixed to the fixing unit 11 and a center portion of thesupporting unit 14 vibrates.

The sensor unit 12 is configured to sense a signal generated due tomovement of the supporting units of the resonators R1, R2, Rn byexternal sound or vibration and may be, for example, a piezo sensor. Thesensor unit 12 may include a lower electrode 12 a, a piezoelectricmaterial layer 12 b, and an upper electrode 12 c that are sequentiallyformed on a surface of the supporting unit 14. The lower electrode 12 aand the upper electrode 12 c of the sensor unit 12 may be formed of aconductive material, for example, molybdenum (Mo) or the like. Aninsulating layer may be optionally formed between the lower electrode 12a and the supporting unit 14. The piezoelectric material layer 12 b maybe used without limitation as long as the piezoelectric material layer12 b includes a piezoelectric material that is usable in the piezosensor. The piezoelectric material layer 12 b may be formed of, forexample, AlN, ZnO, SnO, PZT, ZnSnO₃, polyvinylidene fluoride (PVDF),poly (vinylidene fluoride-trifluoroethylene) (P(VDF-TrFE)) or PMN-PT.However, the resonators R1, R2, Rn are not limited to a piezoelectrictype including a piezo sensor, and an electrostatic sensor may also beused.

A material for forming the mass unit 16 is not limited and may be formedof metal such as Au.

In FIG. 2 , for example, a configuration in which the first resonator R1includes the fixing unit 11, the supporting unit 14, the sensor unit 12,and the mass unit 16 may also apply to the second resonator R2 to then-th resonator Rn of FIG. 1 .

When sound, vibration or force is applied from the outside to theresonators R1, R2, Rn shown in FIGS. 1 and 2 , an inertial force may begenerated according to the behavior of the mass unit 16. When aresonance frequency of the supporting unit 14 and a frequency ofexternal vibration, sound or force are identical to each other, aresonance phenomenon may occur and the inertial force may increase. Suchan inertial force may generate a bending moment in the sensor unit 12.The bending moment may cause stress in each layer of the sensor unit 12.In this case, a charge of magnitude proportional to the stress that isapplied to the sensor unit 12 may occur in the piezoelectric materiallayer 12 b, and a voltage is generated in inverse proportion to thecapacitance between the electrodes 12 a and 12 c. In summary, when thevoltage generated in the sensor unit 12 is detected and analyzed by aninput signal such as speech, vibration, or force from the outside of theresonators R1, R2, Rn, information about the input signal such asspeech, vibration, or force may be obtained.

A frequency band of the input signal sensed by the resonators R1, R2 Rnmay be an audible frequency band in the range of approximately 20 Hz to20 kHz, but is not limited thereto. Speech of an ultrasonic band of 20kHz or more or an ultra low sound band of 20 Hz or less may be received.

An example embodiment of the disclosure provides an apparatus and amethod for recognizing a speaker by using an output value detected bythe resonator sensor 100, that is, an electrical signal.

FIG. 3 is a block diagram schematically illustrating a speakerrecognition apparatus including a resonator according to an exampleembodiment.

Referring to FIG. 3 , a speaker recognition apparatus 200 includes theresonator sensor 100 configured to output an electrical signal of aspecific value in response to an input external signal as shown in FIGS.1 and 2 , and a processor 210 configured to obtain a magnitudedifference of electrical signals corresponding to resonance bands (or adifference between magnitudes of electrical signals output fromresonators having different resonance bands) from the electrical signalreceived from the resonator sensor 100 and recognizing a speaker usingthe magnitude difference of electrical signals corresponding toresonance bands.

The resonator sensor 100 may include a plurality of resonators havingdifferent resonance frequencies, that is, resonance bands, as shown inFIGS. 1 and 2 . Each resonator of the resonator sensor 100 may output anelectrical signal corresponding to the input signal. In the resonatorsensor 100, a resonator having a resonance band included in a frequencyof the input signal may output an electrical signal (for example, avoltage) having a large magnitude, and a resonator having a resonanceband not included in the frequency of the input signal may output anelectrical signal having a small size. Accordingly, each resonator ofthe resonator sensor 100 may output an electrical signal correspondingto the input signal, and thus the resonator sensor 100 may output anelectrical signal corresponding to the frequency of the input signal.

The resonator sensor 100 may be configured to perform at least a part offunctions of the processor 210 described below. For example, theresonator sensor 100 may perform an operation of correcting anelectrical signal with respect to a speech of a speaker, obtaining acharacteristic of the electrical signal, or the like, in addition to anoperation of detecting the speech. In this case, the resonator sensor100 may be a functional module having a hardware module and a softwaremodule.

The processor 210 may drive an operating system and an applicationprogram to control a plurality of components connected to the processor210. The processor 210 may perform speaker recognition using theelectrical signal obtained from the resonator sensor 100.

For example, the processor 210 may obtain the magnitude difference ofelectrical signals corresponding to resonance bands using the electricalsignal received from the resonator sensor 100 and encode the obtainedmagnitude difference of electrical signals corresponding to resonancebands to generate a bitmap of a band gradient. The term “magnitudedifference of the resonance bands” may mean a magnitude difference ofelectrical signals output from resonators having different resonancebands. The bitmap of the band gradient is a map in which the magnitudedifference of electrical signals corresponding to resonance bands issimplified and will be described in detail later.

The processor 210 may generate the bitmap of the band gradient from aregistration process speech of a specific speaker and may generate apersonalized speaker model based on the bitmap of the band gradient. Theregistration process speech may be a speech that is used for registeringthe speech of the specific speaker. For example, the processor 210 maygenerate characteristic values of the registration process speech of thespeaker by applying fast Fourier transform (FFT), two-dimensional (2D)discrete cosine transform (DCT), dynamic time warping (DTW), anartificial neural network, vector quantization (VQ), a Gaussian mixturemodel (GMM), etc., to the bitmap of the band gradient and generate apersonalized speaker model from the characteristic values of theregistration process speech. The processor 210 may generate thepersonalized speaker model by applying the characteristic values of theregistration process speech to a universal background model (UBM). Thegenerated personalized speaker model may be stored in a security regionof a memory 220 as an authentication template for use in comparison witha speech that is input subsequently to determine whether the inputspeech is a speech of the specific speaker.

When speech authentication is performed, the processor 210 may generatethe bitmap of the band gradient from an input speech of an unspecifiedspeaker, generate the characteristic values based on the bitmap of theband gradient, and authenticate the unspecified speaker based oncomparison between the characteristic values and the registeredauthentication template. In this case, the processor 210 may convert thecharacteristic value of the unspecified speaker for comparison with theregistered authentication template and may determine similarity bycomparing the converted characteristic value with the registeredauthentication template. A maximum likelihood estimation method and thelike may be used to determine the similarity. The processor 210 maydetermine that authentication is successful when the similarity isgreater than a first reference value and may determine thatauthentication fails when the similarity is equal to or less than thefirst reference value. The first reference value may be predefined andhave a value based on which it may be determined that the characteristicvalue of the unspecified speaker corresponds to the authenticationtemplate.

Additionally, the processor 210 may obtain the magnitude difference ofelectrical signals corresponding to resonance bands based on theelectrical signal received from the resonator sensor 100 and determine avowel by using the obtained magnitude difference of electrical signalscorresponding to resonance bands. The vowel may include a plurality offormants that are frequency bands in which acoustic energy concentrates.Although a specific formant may have variance among speakers, but thevariance of the specific formant is not significant so as to make itimpossible to distinguish the vowel from other vowels. Therefore, avowel that is pronounced by a plurality of speakers may be generallydistinguished regardless of the speakers, and a model corresponding tothe determined vowel in the authentication template may be used forspeaker recognition. A vowel determination method will be describedlater.

The speaker recognition apparatus 200 may include the memory 220configured to store the authentication template. The memory 220 maytemporarily store information about the speech of the unspecifiedspeaker.

Also, the speaker recognition apparatus 200 may further include adisplay 230 configured to display information and the like. The display230 may display various kinds of information about speech recognition,for example, a user interface for recognition, an indicator indicatingrecognition results, and the like.

FIG. 4 is a diagram illustrating a speaker recognition method using aresonator according to an example embodiment.

Referring to FIG. 4 , in the speaker recognition method according to anexample embodiment, the processor 210 may receive an electrical signalcorresponding to a speech of a speaker from the resonator sensor 100(S310). Each resonator of the resonator sensor 100 may output theelectrical signal corresponding to the speech. The processor 210 mayreceive the electrical signal.

The processor 210 may calculate a magnitude difference of electricalsignals corresponding to resonance bands by using the electrical signalreceived from the resonator sensor 100 (S320). The magnitude differenceof electrical signals corresponding to resonance bands may be amagnitude difference of electrical signals output from two resonatorshaving neighboring resonance frequencies based on a magnitudedifference, for example, a frequency, of electrical signals receivedfrom different resonators.

The processor 210 may calculate the magnitude difference of electricalsignals corresponding to resonance bands by using all of the resonatorsincluded in the resonator sensor 100. In FIG. 1 , when first to n-thresonators have sequentially varying resonance bands, the processor 210may obtain a magnitude difference between electrical signals receivedfrom the first resonator and a second resonator as a first magnitudedifference of the resonance bands, obtain a magnitude difference betweenelectrical signals received from the second resonator and a thirdresonator as a second magnitude difference of the resonance bands, andobtain a magnitude difference between electrical signals received froman (n−1)-th resonator and the n-th resonator as an (n−1)-th magnitudedifference of the resonance bands.

The processor 210 may calculate the magnitude difference of electricalsignals corresponding to resonance bands by using only some of theresonators included in the resonator sensor 100. For example, theprocessor 210 may calculate the magnitude difference of electricalsignals corresponding to resonance bands by using electrical signalsreceived from the first resonator, a fourth resonator, a k-th resonator,and the n-th resonator. When the resonance bands of the first resonatorand the fourth resonator are adjacent to each other, the resonance bandsof the fourth resonator and the k-th resonator are adjacent to eachother, and the resonance bands of the k-th resonator and the n-thresonator are adjacent to each other, the processor 210 may calculate adifference between electrical signals received by the first resonatorand the fourth resonator as a first magnitude difference of theresonance bands, calculate a difference between electrical signalsreceived by the fourth resonator and the k-th resonator as a secondmagnitude difference of the resonance bands, and calculate a differencebetween electrical signals received by the k-th resonator and the n-thresonator as a third magnitude difference of the resonance bands.

The processor 210 may recognize a speaker using the calculated magnitudedifference of electrical signals corresponding to resonance bands(S330). For example, the processor 210 may generate a bitmap of a bandgradient by encoding the magnitude difference of electrical signalscorresponding to resonance bands, generate a characteristic value of thespeech of the speaker using the bitmap of the band gradient, andrecognize the speaker by comparing the generated characteristic value tothe stored authentication template. The bitmap of the band gradient is amap in which the magnitude difference of electrical signalscorresponding to resonance bands is simplified. Detailed descriptions ofthe bitmap of the band gradient will be described later.

Additionally, the processor 210 may determine a vowel using themagnitude difference of electrical signals corresponding to resonancebands. The determined vowel may be used to determine whether an utteringspeaker is a registered speaker. For example, models corresponding tothe determined vowel among personalized speaker models included in theauthentication template may be weighted, or only the modelscorresponding to the determined vowel may be used for speakerrecognition. The speaker recognition apparatus 200 may recognize thespeaker by using the magnitude difference of electrical signalscorresponding to resonance bands. A method of using the magnitudedifference of electrical signals corresponding to resonance bands mayeffectively remove common noise between resonance frequencies.

FIG. 5 is an example of a graph showing a speech having differentresonance bands.

By using a magnitude difference of electrical signals corresponding toresonance bands to identify a center frequency of the resonance bands, aregion hatched in FIG. 5 may be removed. The hatched region is afrequency region that is weak in relation to the center frequency of theresonance bands and may correspond to noise. Thus, a common noise havinga weak correlation with the center frequency may be effectively removedby using the magnitude difference of electrical signals corresponding toresonance bands. By removing the common noise in this manner, variousalgorithms for noise removal may not be needed or only a simplealgorithm may be used for further noise removal, thereby performingspeech recognition more efficiently. In other words, the magnitudedifference of electrical signals corresponding to resonance bandsaccording to an example embodiment may be used to omit or simplify apreprocessing process for noise removal.

FIG. 6 is a diagram illustrating an example of generating a bitmap of aband gradient using a magnitude difference of electrical signalscorresponding to resonance bands.

Referring to FIGS. 1 and 6 , each of the resonators R1, R2, Rn of theresonator sensor 100 may output an electrical signal in response to aspeaker's voice. Each of the resonators R1, R2, Rn may have a resonancefrequency as shown in (a) of FIG. 6 . A plurality of resonancefrequencies may be mixed in the speech of a speaker. Each resonator mayoutput an electric signal corresponding to a frequency included in thespeech of the speaker. For example, when the speech of the speakerincludes a first frequency H1, the first resonator R1 may resonate andoutput an electric signal of a large magnitude.

The processor 210 may calculate the magnitude difference of theresonance band as shown in (b) of FIG. 6 by using the electrical signalreceived from the resonator sensor 100. The processor 210 may calculatethe magnitude difference of electrical signals corresponding toresonance bands using the electrical signal output from neighboringresonators based on the resonance frequency. (b) of FIG. 6 shows aresult of calculating the magnitude difference of electrical signalscorresponding to resonance bands by using all of the resonators includedin the resonator sensor 100. In (a) of FIG. 6 , first to n-th resonatorshave sequentially varying resonance bands, and thus the processor 210may calculate a magnitude difference of electrical signals ofneighboring resonators among the first to n-th resonators as themagnitude difference of electrical signals corresponding to resonancebands. For example, a first magnitude difference G1 of the resonancebands is a magnitude difference between electrical signals received bythe first resonator R1 and the second resonator R2, a second magnitudedifference G2 of the resonance bands is a magnitude difference betweenelectrical signals received by the second resonator and a thirdresonator, and a third magnitude difference G3 of the resonance bands isa magnitude difference of electrical signals received by the thirdresonator and a fourth resonator. An (n−1)-th magnitude difference Gn−1of the resonance bands is a magnitude difference between electricalsignals received by an (n−1)-th resonator and the n-th resonator.

The processor 210 may encode the magnitude difference of electricalsignals corresponding to resonance bands as shown in (c) of FIG. 6 . Forexample, the processor 210 may encode a difference of speech using thefollowing equation,

$\begin{matrix}{T_{k} = \left\{ \begin{matrix}1 & {{{H_{k + 1}(\omega)} - {H_{k}(\omega)}} \geqq \alpha} \\0 & {{- \alpha} \leqq {{H_{k + 1}(\omega)} - {H_{k}(\omega)}} < \alpha} \\{- 1} & {{{H_{k + 1}(\omega)} - {H_{k}(\omega)}} < {- \alpha}}\end{matrix} \right.} & \left\lbrack {{Equation}1} \right\rbrack\end{matrix}$

wherein, H_(k) denotes a band characteristic (i.e., an electricalsignal) of a k-th resonator, H_(k+1) denotes a band characteristic of a(k+1)-th resonator, and T_(k) denotes a value by encoding acharacteristic difference between the k-th band resonator and the(k+1)-th resonator. The encoded value is referred to as a bit value ofthe resonance band. α denotes an arbitrary constant and may bedetermined according to an embodiment.

FIG. 7 is a graph illustrating an equation for encoding a magnitudedifference of electrical signals corresponding to resonance bandsaccording to an embodiment.

In FIG. 7 , α and −α denote threshold values. An encoding value for aspeech of a speaker may vary depending on the magnitude of a thresholdvalue. Referring to Equations 1 and 7, with respect to the speech from aspeaker, the processor 210 may encode the magnitude difference ofelectrical signals corresponding to resonance bands as three resultantvalues −1, 0, +1 by expressing a difference in output values between theresonators R1, R2 . . . Rn having adjacent resonance bands as 1 when thedifference is equal to or greater than a specified value α, and −1 whenthe difference is less than −α, and 0 when the difference is less than aand equal to or greater than −α.

In (c) of FIG. 6 , encoding values of the magnitude difference ofelectrical signals corresponding to resonance bands may be obtained byusing Equation 1 (that is, 0, −1, 0, . . . , −1) corresponding torespective frequency regions between neighboring resonators (e.g., afrequency region between the first resonator R1 and the second resonatorR2, a frequency region between the second resonator R2 and the thirdresonator, a frequency region between the third resonator and the fourthresonator, . . . , a frequency region between the (n-th)-th resonatorand the n-th resonator). (d) of FIG. 6 is a graph showing the bit valuesshown in (c) of FIG. 6 . The maximum magnitude and the minimum magnitudeof electrical signals output from the resonator sensor 100 differ byabout 100 times as shown in (b) of FIG. 6 . However, when a signaloutput from the resonator sensor 100 is converted into a bit value of aband gradient, the bit value may be simplified to 8 levels as shown in(d) of FIG. 6 .

In FIG. 6 , the processor 210 encodes the magnitude difference ofelectrical signals corresponding to resonance bands to one of values of−1, 0, and 1, but this is merely an example. The processor 210 mayencode the magnitude difference of electrical signals corresponding toresonance bands in various forms. For example, the processor 210 mayencode the magnitude difference of electrical signals corresponding toresonance bands to any one of three or more odd-numbered values. Thecorresponding values of the remaining values based on one of the threeor more odd-numbered values may have the same absolute value andopposite signs. For example, the processor 210 may encode the magnitudedifference of electrical signals corresponding to resonance bands to oneof values of −2, −1, 0, 1, and 2. Alternatively, the processor 210 mayencode the magnitude difference of electrical signals corresponding toresonance bands to any one of even-numbered values. The correspondingvalues of the even-numbered values may have the same absolute value andopposite signs. For example, the processor 210 may encode the magnitudedifference of electrical signals corresponding to resonance bands to oneof values of −3, −1, 1, and 3.

When the above described operation is performed with respect to all ofelectrical signals output from the resonator sensor 100, a bitmap of atwo-dimensional band gradient over time may be generated. The bitmap ofthe two-dimensional band gradient differs depending on a speaker andthus may be a characteristic that is used for speaker recognition.

FIG. 8 is a diagram showing a bitmap of a two-dimensional band gradientover time according to an example embodiment.

As shown in FIG. 8 , the bitmap of the band gradient may be generatedfor each time frame. The processor 210 may generate the bitmap of theband gradient according to a frame of a predetermined time unit, but isnot limited thereto. When the bitmap of the band gradient is generatedin the predetermined time unit and successively generated bitmaps havethe same value, only one bitmap may be used to perform speakerrecognition. For example, a speaker may utter a syllable ‘u’ for twoseconds and a plurality of bitmaps of the band gradient may be generatedaccording to a frame of a predetermined time unit. In such a case, theprocessor 210 may use the plurality of bitmaps of the band gradientgenerated during the utterance for two seconds for speaker recognition,or alternatively, may remove the same bitmaps from among the pluralityof bitmaps of the band gradient generated during the utterance for twoseconds and may use only a bitmap that is not the same for speakerrecognition. A method of generating a bitmap of the two-dimensional bandgradient may vary according to the utilization of recognition.

The processor 210 may register a speech of a speaker by generating apersonalized speaker model of a specific speaker using the bitmap of theband gradient and storing the personalized speaker model as anauthentication template. When a speech of an unspecified speaker isreceived, the speech of the unspecified speaker may be compared with apreviously stored authentication template to determine a similarity andwhether the unspecified speaker is the same as a registered speaker isdetermined based on the similarity.

For example, when a word ‘start’ is used to register a speech forrecognition, a specific speaker may utter the word ‘start’. Each or someof resonators of the resonator sensor 100 may output an electricalsignal corresponding to the utterance of the word ‘start’. The processor210 may calculate and encode a magnitude difference of electricalsignals corresponding to resonance bands from the electrical signalreceived from the resonator sensor 100, generate a bitmap of the bandgradient, and then calculate a personalized characteristic valuecorresponding to the utterance of the word ‘start’ by using the bitmapof the band gradient, generate a personalized speaker model with thepersonalized characteristic value, and register the personalized speakermodel as an authentication template. Then, when the unspecified speakerutters the word ‘start’, the processor 210 may generate a bitmap of aband gradient corresponding thereto and calculate characteristic valuescorresponding to the utterance of the word ‘start’ by the unspecifiedspeaker by using the bitmap. The processor 210 may convert thecharacteristic values into a form that may be compared with theauthentication template, compare the personalized speaker model of theconverted form with the authentication template, and perform speakerrecognition by determining whether the unspecified speaker is aregistered speaker.

As described above, when speaker recognition is performed by using theband gradient, that is, the magnitude difference of electrical signalscorresponding to resonance bands, a processing process may be simplifiedas compared with speech processing using STFT (Short Time FourierTransform) and MFCC (Mel Frequency Cepstrum Coefficients).

A speaker recognition method according to an example embodiment mayadditionally use a vowel. Vowels may include formants that areconstituent phonemes. Here, the formant means a distribution of thefrequency intensity of acoustic energy generated due to a cavityresonance phenomenon caused by a shape, a size, and the like of apassage of a person's pronunciation organ, that is, a frequency band inwhich acoustic energy concentrates.

FIGS. 9 and 10 are graphs showing an energy distribution of a specificvowel in a speech model. FIG. 9 is a spectrum showing a resonance bandof a vowel [AH] pronunciation. FIG. 10 is a spectrum showing a resonanceband of a vowel [EE] pronunciation.

Referring to the spectrums of the vowels with reference to FIGS. 9 and10 , it may be seen that not one resonance band but several resonancebands exist. Depending on a speaker, the spectrums of the vowel [AH]pronunciation and the vowel [EE] pronunciation may be different.However, such a change in the spectrum depending on the speaker is notsignificant so as to make it impossible to distinguish vowels [AH] from[EE]. This equally applies equally to pronunciation of other vowels. Inother words, vowels may be generally distinguished from each other,despite a speech characteristic of each individual speaker.

Resonance bands in a vowel may be referred to as a first formant F1, asecond formant F2, and a third formant F3 from the order of a lowerfrequency side. A center frequency of the first formant F1 is thesmallest. A center frequency of the third formant F3 is the largest. Acenter frequency of the second formant F2 may have a magnitude betweenthe first formant F1 and the third formant F3. Upon comparing a speechof the speaker with an output of each of the resonators R1, R2 Rn of theresonator sensor 100 shown in FIG. 1 , a center frequency of the speechmay be determined and locations of the first formant F1, the secondformant F2, and the third formant F3 may be obtained. When the locationsof the first formant F1, the second formant F2, and the third formant F3are obtained, a vowel in the speech from the speaker may be obtained.

FIGS. 11 and 12 are graphs illustrating estimating a position of aformant using resonators spaced from each other in connection with voweldetermination according to an example embodiment.

Two different resonators among the resonators R1, R2, Rn of theresonator sensor 100 shown in FIG. 1 may output an electrical signalcorresponding to an input signal from a speaker. The two resonators thatare spaced apart from each other may be adjacent or non-adjacent to eachother. Referring to FIG. 11 , a first resonator having a resonancefrequency ω_(a) and a second resonator having a resonance frequencyω_(e) may output electrical signals of different magnitudescorresponding to the input signal of the speaker. For example, when thecenter frequency of the speech is ω_(a), an output value H₁(ω) of thefirst resonator may be very large, and an output value H₂(ω) of thesecond resonator may be absent or very small. When the center frequencyof the speech is ω_(c), both the output value H₁(ω) of the firstresonator and the output value H₂(ω) of the second resonator may be verysmall. When the center frequency of the speech is ω_(c), the outputvalue H₁(ω) of the first resonator may be absent or very small, and theoutput value H₂(ω) of the second resonator may be very large.

In other words, when the center frequency of the speech has a value suchas ω_(a), ω_(b), ω_(c), ω_(d) or ω_(e) etc., the output values of thefirst resonator and the second resonator are different from each other.Therefore, it may be seen that a difference H₂(ω)−H₁(ω) between theoutput values of the first resonator and the second resonator alsovaries with the center frequency of the speech as shown in FIG. 12 .Thus, the center frequency of the speech may be determined inverselyfrom the difference between the output values of the two resonators.That is, the formant which is the center frequency of the speech may bedetermined using a magnitude difference of electrical signalscorresponding to resonance bands between resonators, and a vowel may bedetermined from a position of the center frequency.

The vowel generally includes three formants. The processor 210 mayselect four resonators of the resonator sensor 100 and determine theformants using electrical signals output from the selected resonators.

FIG. 13 is a reference diagram showing positions of formants of a vowelaccording to an example embodiment.

Referring to FIG. 13 , the horizontal axis indicates types of vowels andthe vertical axis indicates center frequencies of the first formant F1,the second formant F2, and the third formant F3 according to the vowels.Positions of the first formant F1, the second formant F2, and the thirdformant F3 according to the vowels shown in FIG. 13 may use positiondata of formants of generally known vowels. For example, the positionsof the formants of the vowels may be obtained using a vowel informationdatabase by various speakers which may be referred to as a universalbackground model (UBM).

As shown in FIG. 13 , it may be seen that each vowel generally includesthree formants. It may be seen that the positions of the formants aredifferent according to the vowels. A formant having the lowest centerfrequency among the three formants may be referred to as the firstformant, a formant having the highest center frequency as the thirdformant, and a formant having an intermediate center frequency as thesecond formant.

In order to determine the three formants, the processor 210 may selectfour resonators having different resonance frequencies from theresonator sensor 100 shown in FIG. 1 . When selecting the fourresonators, the processor 210 may select any one of the resonatorshaving a resonance frequency lower than a center frequency of the firstformant as a first resonator, any one of the resonators having aresonance frequency between the center frequency of the first formantand a center frequency of the second formant as a second resonator, anyone of the resonators having a resonance frequency between the centerfrequency of the second formant and a center frequency of the thirdformant as a third resonator, and any one of the resonators having aresonance frequency higher than the center frequency of the thirdformant as a fourth resonator. For example, the processor 210 may selectfour resonators having resonance frequencies of about 300 Hz, about 810Hz, about 2290 Hz, and about 3000 Hz, respectively.

The processor 210 may determine the first through third formants using adifference between output values of two resonators having neighboringresonance bands among the four resonators. For example, the processor210 may determine the first formant by a difference H₂(ω)−H₁(ω) betweenoutput values of the first and second resonators and the second formantby a difference H₃(ω)−H₂(ω) between output values of the second andthird resonators. The processor 210 may determine the third formant by adifference H₄(ω)−H₃(ω) between output values of the third and fourthresonators. The processor 210 may determine the first through thirdformants respectively from the difference H₂(ω)−H₁(ω) between the outputvalues of the first and second resonators, the difference H₃(ω)−H₂(ω)between the output values of the second and third resonators, and thedifference H₄(ω)−H₃(ω) between the output values of the third and fourthresonators and determine an uttered vowel using the first through thirdformants regardless of who is a speaker. The determined vowel may beused to determine whether an uttering speaker is a registered speaker.Specifically, only a model corresponding to the determined vowel amongpersonalized speaker models included in an authentication template maybe used for speaker recognition.

FIG. 14 is a flowchart illustrating a method of recognizing a speakerusing a vowel and a bitmap of a band gradient.

Referring to FIG. 14 , the processor 210 may receive an electricalsignal corresponding to a speech of the speaker from the resonatorsensor 100 (S1110). For example, the speaker may utter ‘we’ and theresonator sensor 100 may output an electrical signal corresponding to‘we’ such that the processor 210 may receive the electrical signalcorresponding to ‘we’.

The processor 210 may calculate a magnitude difference of electricalsignals corresponding to resonance bands using the electrical signalreceived from some resonators (S1120). Some resonators may be predefinedto determine formants of the vowel. For example, the processor 210 maycalculate the magnitude difference of electrical signals correspondingto resonance bands using electrical signals received from fourpredetermined resonators to determine the three formants in the manneras described above.

The processor 210 may determine the vowel using a magnitude differenceof electrical signals corresponding to resonance bands of someresonators (S1130). For example, the processor 210 may determine firstthrough third formants using a magnitude difference of the fourresonator bands and determine the vowel using relative positionrelationships of the first through third formants. When the vowel isdetermined, the graph shown in FIG. 13 may be used. For example, theprocessor 210 may determine vowels ‘u’ and ‘i’ in time sequence usingthe relative position relationships of the first through third formants.

The processor 210 may assign a weight to the determined vowel (S1140).For example, the processor 210 may assign the weight of the determinedvowel to a higher value than weights of other vowels.

The processor 210 may generate the bitmap of the band gradient using theelectrical signal received from all of the resonators included in theresonator sensor 100 (S1150). Specifically, the processor 210 maycalculate and encode the magnitude difference of electrical signalscorresponding to resonance bands using the electrical signal receivedfrom all of the resonators of the resonator sensor 100 to generate thebitmap of the band gradient. In operation S1150, the processor 210generates the bitmap of the band gradient using the electrical signalreceived from all of the resonators. However, the processor 210 maygenerate the bitmap of the band gradient using the electrical signalreceived from some resonators. The bitmap of the band gradient needs toinclude more detailed information about the speech of the speaker thanthe vowel determination, and thus the number of bitmaps of the bandgradient may be greater than the number of resonators used for voweldetermination.

The processor 210 may generate a speaker characteristic value using thegenerated bitmap of the band gradient (S1160). The processor 210 maygenerate the speaker characteristic value from the bitmap of the bandgradient using fast Fourier transform (FFT), 2D discrete cosinetransform (DCT), dynamic time warping (DTW), an artificial neuralnetwork, vector quantization (VQ), a Gaussian mixture model (GMM), etc.The speaker characteristic value may be converted into a form that maybe compared with an authentication template. During this conversionprocess, the processor 210 may use a universal background model (UBM).

The processor 210 may recognize the speaker by comparing the convertedspeaker characteristic value with the authentication template using theweight (S1170). The processor 210 may apply a high weight to a modelcorresponding to a determined vowel component of the authenticationtemplate and a low weight to other vowel components. For example, whendetermined vowels are [u] and [i], the processor 210 may apply highweights to models corresponding to components of [u] and [i] in theauthentication template and apply a low weight to other components tocompare the converted speaker characteristic value with theauthentication template. When a comparison result (or a similarity) isequal to or greater than a reference value, the processor 210 maydetermine the uttering speaker as a registered speaker, and when thecomparison result is less than the reference value, the processor 210may determine that the uttering speaker is not the registered speaker.

The assigned weight may be 1 or 0. In other words, the processor 210 mayuse only a model corresponding to the determined vowel of theauthentication template for comparison.

FIG. 15 is a reference diagram for explaining a comparison betweenspeaker characteristic values and authentication templates in a shortspeech.

In FIG. 15 , hatched areas indicate UBM models, “+” pattern areasindicate personalized speaker models, that is, the registeredauthentication templates, and ♦ indicate the speaker characteristicvalues. For example, when a speaker shortly utters ‘we’, the processor210 may obtain [u] and [i] as uttered vowel components. When theprocessor 210 generates speaker characteristic values 1230, the vowelcomponents of [u] and [i] may be characteristics indicating the speaker.Accordingly, when determining a similarity with respect to the speakercharacteristic values 1230, by applying a high weight of a model 1210corresponding to the vowel components of [u] and [i] in theauthentication templates and applying a low weight of a model 1220corresponding to other vowel components, an influence of the utteredvowel component is high, and thus accuracy of speaker recognition may beimproved.

Operations S1150 and S1160 of generating the speaker feature value andoperations S1120 to S1140 of assigning the weights are not necessarilyperformed sequentially but two processes may be performed simultaneouslyor some operations of a process for assigning the weights may beperformed first, and then operations S1150 and S1160 of generating thespeaker feature value may be performed. For example, the resonatorsensor 100 shown in FIG. 1 may proceed to operation S1130 of determininga vowel from a speech of a speaker using four resonators havingdifferent bands, and at the same time, proceed to operation S1150 ofgenerating a bitmap of a band gradient using signals output from all ofthe resonators R1, R2, . . . Rn.

Although the method of recognizing the speaker using both the bitmap ofthe band gradient and the vowel determination is described above, thespeaker may be recognized using only the bitmap of the band gradient.For example, in a case where it is agreed to recognize the speaker byusing a predetermined specific word (e.g., ‘start’), an authenticationtemplate may be recognized only by a personalized model corresponding tothe word ‘start’ of a specific speaker. In this case, the speaker may berecognized using only the bitmap of the band gradient, and the voweldetermination may be unnecessary. Alternatively, a large number ofpersonalized models may be required for the authentication template forrecognition even when a specific speaker randomly utters words, phrases,or sentences. In this case, personalized models may be classified byvowels, and a model corresponding to a determined vowel may be used forcomparison for recognition. Also, the speaker may be recognized byapplying a weight to a vowel corresponding to a characteristic value ofthe speaker that is generated using a method other than the bitmap ofthe band gradient.

As described above, in the speaker recognition method and apparatususing the resonator according to an example embodiment, the resonatorsensor 100 may include a plurality of mechanical resonators of varioustypes. The resonator sensor 100 may have various shapes, and shapes orarrangements of resonators included therein may be selected as needed.Center frequencies of the resonators included in the resonator sensor100 may be changed by adjusting a length L of the supporting unit 14shown in FIG. 2 . The resonators of the resonator sensor 100 may beformed to have various center frequency intervals according to need of auser.

FIGS. 16 and 17 are diagrams showing examples in which centerfrequencies of a plurality of resonators of a resonator sensor 100 a areset at an equal interval according to an example embodiment.

Referring to FIG. 16 , center frequencies of resonators Rm may beinversely proportional to a resonator length, that is, a square of thelength L of the supporting unit 14 shown in FIG. 2 . Thus, as shown inFIG. 17 , when differences in length between the adjacent resonators Rmare constant, the resonators Rm included in the resonator sensor 100 amay make a ratio of resonators having a center frequency in a relativelylow frequency region higher than a ratio of resonators having a centerfrequency in a high frequency region.

FIGS. 18 and 19 are diagrams showing examples in which centerfrequencies of a plurality of resonators of a resonator sensor 100 b areset at a constant interval according to an example embodiment.

Referring to FIGS. 18 and 19 , resonators Rn included in the resonatorsensor 100 b may be formed such that differences in length between theresonators Rn adjacent to each other become smaller as the beam lengthof the resonators becomes shorter. In this case, the differences in thecenter frequencies of the resonators Rn may be set to have a uniformconstant interval.

FIGS. 20 and 21 are diagrams showing examples in which centerfrequencies of a plurality of resonators of a resonator sensor 100 c areset at an arbitrary interval according to an example embodiment.

Referring to FIGS. 20 and 21 , the resonator sensor 100 c may be formedsuch that intervals of lengths of resonators Ro included in theresonator sensor 100 c do not have a specific regularity. For example,in FIG. 21 , to increase a ratio of resonators having a center frequencyin the range of 2000 Hz to 3000 Hz, lengths of the resonators in someperiods may be adjusted.

As described above, in the speaker recognition method and device usingthe resonator according to an example embodiment, the resonator sensors100, 100 a, 100 b, and 100 c may include resonators having resonancefrequencies of equal and constant intervals or resonators havingresonance frequencies of arbitrary bands.

FIG. 22 is a plan view showing a schematic structure of a resonatorsensor 100 d including a plurality of resonators according to an exampleembodiment.

Referring to FIG. 22 , the resonator sensor 100 d may include asupporting unit 30 having a cavity or a through hole 40 formed in acentral portion thereof, and a plurality of resonators R extending fromthe supporting unit 30 and surrounding the cavity or the through hole40. The resonators R1, R2, Rn of the resonator sensor 100 extend in onedirection in FIG. 1 , whereas, as shown in FIG. 22 , the resonatorsensor 100 d according to an example embodiment may be formed to havevarious structures.

FIGS. 23 through 25 are graphs illustrating examples of variouslychanging bandwidths of a plurality of resonators of a resonator sensoraccording to an example embodiment.

In the resonator sensor according to an example embodiment, bands of theresonators may be adjusted (e.g., narrowed) in order to change frequencyintervals of the bands of the resonators as necessary or improve aresolution of a specific band. For example, when a resonator frequencybandwidth in FIG. 23 is referred to as a reference bandwidth S11, in thecase of FIG. 24 , the resonators may be formed to have a bandwidth S12narrower than the reference bandwidth S11. Further, as shown in FIG. 25, the resonators may be formed to have a bandwidth S13 wider than thereference bandwidth S11 of FIG. 23 .

FIG. 26 is a graph showing that a bandwidth of a specific resonatoramong a plurality of resonators of a resonator sensor is set wideaccording to an example embodiment.

Referring to FIG. 26 , a bandwidth S22 of specific resonators of theresonator sensor 100 that are used to determine a vowel of an inputsignal of FIG. 3 may be formed to be relatively larger than a bandwidthS21 of remaining resonators of the resonator sensor 100 such that aprocess of determining the vowel of the input signal may be performedmore efficiently.

According to one or more example embodiments, a substantial length ofspeech is not required for speaker recognition, and accurate speakerrecognition is possible even with a relatively short input signal. Theefficiency of speaker recognition may be improved by determining a vowelin an input signal and using a limited comparison group for speakerrecognition.

According to one or more example embodiments, a resonator sensor may notrequire Fourier transform, may maintain frequency band information, andmay improve time resolution. Since only a difference between electricalsignals of adjacent resonators is used, the influence on a common noisemay be removed.

The speaker recognition method and apparatus as described above may beapplied to various fields. For example, the speaker recognition methodand apparatus may operate or unlock a specific device employed orinstalled in a mobile device, a home, or a vehicle by accuratelyrecognizing whether a speaker is a registered speaker through a speechsignal.

While one or more exemplary embodiments have been described withreference to the figures, it will be understood by those of ordinaryskill in the art that various changes in form and details may be madetherein without departing from the spirit and scope of the inventiveconcept as defined by the following claims.

What is claimed is:
 1. A method of recognizing a speaker, the methodcomprising: receiving a plurality of electrical signals corresponding toa speech of the speaker from a plurality of resonators having differentresonance bands; obtaining a difference of magnitudes of the pluralityof electrical signals corresponding to resonance bands; generating abitmap of a band gradient by encoding the difference of magnitudes ofthe plurality of electrical signals; and recognizing the speaker basedon the bitmap of the band gradient.
 2. The method of claim 1, whereinthe encoding comprises converting the difference of magnitudes of theplurality of electrical signals into any one of three or more odd-numberof values, by using one or more threshold values.
 3. The method of claim2, wherein the three or more odd-number of values comprise values thathave a same absolute value and opposite signs.
 4. The method of claim 1,wherein the recognizing the speaker further comprises registering thespeaker by: generating a speaker model based on the bitmap of the bandgradient; and registering the speaker model as an authenticationtemplate.
 5. The method of claim 4, wherein the recognizing the speakerfurther comprises determining whether the speaker is a registeredspeaker by: generating a characteristic value based on the bitmap of theband gradient; and determining whether the speaker is the registeredspeaker based on comparison between the characteristic value and theauthentication template.
 6. The method of claim 1, wherein therecognizing the speaker comprises: determining a vowel included in thespeech of the speaker based on the difference of magnitudes of theplurality of electrical signals.
 7. The method of claim 6, wherein thedetermining the vowel comprises: estimating relative positions offormants based on the difference of magnitudes of the plurality ofelectrical signals; and determining the vowel based on the relativepositions of the formants.
 8. The method of claim 7, wherein thedifference of magnitudes of the plurality of electrical signals isdetermined based on magnitudes of electrical signals received from fourresonators of a resonator sensor.
 9. The method of claim 6, wherein thedifference of magnitudes of the plurality of electrical signalscomprises a plurality of differences of magnitudes of the plurality ofelectrical signals, and the recognizing the speaker further comprises:assigning a weight to a model corresponding to the determined vowel inan authentication template; generating the bitmap of the band gradientbased on a first difference of magnitudes of the plurality of electricalsignals, which is different from a second difference of magnitudes ofthe plurality of electrical signals used to determine the vowel;generating a characteristic value based on the bitmap of the bandgradient; and recognizing whether the speaker is a registered speakerbased on comparison between the characteristic value and theauthentication template, in which the model corresponding to thedetermined vowel is assigned the weight.
 10. The method of claim 9,wherein the assigning the weight comprises assigning the weight to themodel corresponding to the determined vowel that is higher than a weightassigned to a model corresponding to another vowel.
 11. The method ofclaim 10, wherein a number of the first difference of magnitudes of theplurality of electrical signals used to generate the bitmap of the bandgradient is greater than a number of the second difference of magnitudesof the plurality of electrical signals used to determine the vowel. 12.An apparatus for recognizing a speaker, the apparatus comprising: aresonator sensor comprising a plurality of resonators having differentresonance bands, the plurality of resonators configured to output aplurality of electrical signals corresponding to a speech of thespeaker; and a processor configured to obtain a difference of magnitudesof the plurality of electrical signals corresponding to resonance bandsand recognize the speaker based on the difference of magnitudes of theplurality of electrical signals, wherein the processor is furtherconfigured to generate a bitmap of a band gradient by encoding thedifference of magnitudes of the plurality of electrical signals andrecognize the speaker based on the bitmap of the band gradient.
 13. Theapparatus of claim 12, wherein the processor is further configured toencode the difference of magnitudes of the plurality of electricalsignals by converting the difference of magnitudes of the plurality ofelectrical signals into any one of three or more odd-number of values,by using one or more threshold values.
 14. The apparatus of claim 12,wherein the processor is further configured to determine whether thespeaker is a registered speaker based on comparison between acharacteristic value determined based on the bitmap of the band gradientand a registered authentication template.
 15. The apparatus of claim 12,wherein the processor is further configured to determine a vowelincluded in the speech of the speaker based on the difference ofmagnitudes of the plurality of electrical signals.
 16. The apparatus ofclaim 15, wherein the processor is further configured to estimaterelative positions of formants based on the difference of magnitudes ofthe plurality of electrical signals and determine the vowel based on therelative positions of the formants.
 17. The apparatus of claim 16,wherein the difference of magnitudes of the plurality of electricalsignals is determined based on magnitudes of electrical signals receivedfrom four resonators of the resonator sensor.
 18. The apparatus of claim16, wherein the difference of magnitudes of the plurality of electricalsignals comprises a plurality of differences of magnitudes of theplurality of electrical signals, and the processor is further configuredto: assign a weight to a model corresponding to the determined vowel inan authentication template, generate a characteristic value based on afirst difference of magnitudes of the plurality of electrical signals,which is different from a second difference of magnitudes of theplurality of electrical signals used to determine the vowel, andrecognize the speaker based on comparison between the characteristicvalue and the authentication template, in which the model correspondingto the determined vowel is assigned.
 19. The apparatus of claim 18,wherein the processor is further configured to assign the weight to themodel corresponding to the determined vowel that is higher than a weightassigned to a model corresponding to another vowel.